A Continuum-Based Approach for Tightness Analysis of Chinese Semantic Units

نویسندگان

  • Ying Xu
  • Christoph Ringlstetter
  • Randy Goebel
چکیده

Chinese semantic units fall into a continuum of connection tightness, ranging from very tight, non-compositional expressions, tight compositional words, phrases, and then to loose more or less arbitrary combinations of words. We propose an approach to measure tightness connection within this continuum, based on document frequency of segmentation patterns in a reference corpus. A variety of corpora, including search engine snippets, search engine results derived from query logs, as well as standard corpora have been investigated. Our tightness ranking on 300 phrases is quite close to their manual ranking, and non-compositional compound extraction can achieve a precision as high as 94.3% on the top 1,000 4-grams extracted from the Chinese Gigaword corpus.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Application of the Tightness Continuum Measure to Chinese Information Retrieval

Most word segmentation methods employed in Chinese Information Retrieval systems are based on a static dictionary or a model trained against a manually segmented corpus. These general segmentation approaches may not be optimal because they disregard information within semantic units. We propose a novel method for improving word-based Chinese IR, which performs segmentation according to the tigh...

متن کامل

Study of Aspect Ratio Effect on Mechanical Properties Polymer/NanoComposite

Carbon nanotubes (CNTs) demonstrate unusually high stiffness, strength and resilience, and are therefore an ideal reinforcing material for nanocomposites. However, much work has to be done before the potentials of CNT-based composites can be fully realized.  Evaluating the effective material properties of such nanoscale materials is a very difficult tasks.  Simulations using molecular dynamics ...

متن کامل

Comparative Effectiveness of Semantic Feature Analysis (SFA) and Phonological Components Analysis (PCA) for Anomia Treatment in Persian Speaking Patients With Aphasia

Objectives: Anomia is one of the most common and persistent symptoms of aphasia. Although treatments of anomia usually focus on semantic and/or phonological levels, which both have been demonstrated to be effective, the relationship between the underlying functional deficit in naming and response to a particular treatment approach remains unclear. The aim of this study was to determine the rela...

متن کامل

Evaluation of “Mosaic 1 Reading”: A Microstructural Approach to Textual Analysis of Pedagogical Materials

To analyze and evaluate textbooks, researchers have either proposed scales and checklists to be filled by teachers and learners or conducted qualitative investigations of the match between SLA theories and textbook activities. This study, however, employs the microstructural approach of schema theory to scrutinize the reading passages of “Mosaic 1 Reading”. To this end, 17 passages of the textb...

متن کامل

An Executive Approach Based On the Production of Fuzzy Ontology Using the Semantic Web Rule Language Method (SWRL)

Today, the need to deal with ambiguous information in semantic web languages is increasing. Ontology is an important part of the W3C standards for the semantic web, used to define a conceptual standard vocabulary for the exchange of data between systems, the provision of reusable databases, and the facilitation of collaboration across multiple systems. However, classical ontology is not enough ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009